LLM, Agent, and Framework Standards Guide
Table of Contents
- Overview
- Why Standards Matter
- Tool Calling Standards
- Agent Communication Protocols
- Prompt Format Standards
- API Standards
- Evaluation Standards
- Interoperability Standards
- Emerging Standards
- Best Practices
Overview
As the AI ecosystem matures, standardization becomes critical for interoperability, portability, and ecosystem growth. This guide covers the major standards, protocols, and conventions used across LLMs, agents, and frameworks.
Key Standards Bodies: - OpenAI (de facto standards through API design) - Anthropic (Claude API standards) - Model Context Protocol (MCP) Consortium - OpenAPI Initiative - W3C (potential future involvement) - Linux Foundation AI & Data
Why Standards Matter
The Problem Without Standards
┌─────────────────────────────────────────────────────────────┐
│ WITHOUT STANDARDS: Fragmentation │
├─────────────────────────────────────────────────────────────┤
│ │
│ App A → Custom Format → Model X │
│ App B → Different Format → Model Y │
│ App C → Another Format → Model Z │
│ │
│ Result: │
│ ❌ N × M integrations (every app × every model) │
│ ❌ No portability │
│ ❌ Vendor lock-in │
│ ❌ Duplicate effort │
│ ❌ Slow innovation │
└─────────────────────────────────────────────────────────────┘
The Solution With Standards
┌─────────────────────────────────────────────────────────────┐
│ WITH STANDARDS: Interoperability │
├─────────────────────────────────────────────────────────────┤
│ │
│ App A ─┐ │
│ App B ─┼─→ Standard Protocol → Any Model │
│ App C ─┘ │
│ │
│ Result: │
│ ✅ Write once, use everywhere │
│ ✅ Easy model switching │
│ ✅ No vendor lock-in │
│ ✅ Ecosystem growth │
│ ✅ Faster innovation │
└─────────────────────────────────────────────────────────────┘
Tool Calling Standards
1. OpenAI Function Calling Standard
Status: De facto industry standard
Adoption: OpenAI, Azure OpenAI, many open-source models
Specification: https://platform.openai.com/docs/guides/function-calling
Format:
{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "What's the weather in Paris?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. Paris"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}
Response Format:
{
"choices": [
{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"Paris\", \"unit\": \"celsius\"}"
}
}
]
}
}
]
}
Key Features: - JSON Schema for parameter validation - Multiple tool calls in single response - Tool choice control (auto, required, none) - Unique call IDs for tracking
Adoption: - ✅ OpenAI GPT-3.5, GPT-4 - ✅ Azure OpenAI - ✅ Many open-source models (via adapters) - ✅ LangChain, LlamaIndex support
2. Anthropic Tool Use Standard
Status: Claude-specific, growing adoption
Adoption: Anthropic Claude, AWS Bedrock (Claude)
Specification: https://docs.anthropic.com/claude/docs/tool-use
Format:
{
"model": "claude-3-opus-20240229",
"max_tokens": 1024,
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
],
"messages": [
{"role": "user", "content": "What's the weather in Paris?"}
]
}
Response Format:
{
"content": [
{
"type": "tool_use",
"id": "toolu_01A09q90qw90lq917835lq9",
"name": "get_weather",
"input": {
"location": "Paris",
"unit": "celsius"
}
}
],
"stop_reason": "tool_use"
}
Key Differences from OpenAI:
- Uses input_schema instead of parameters
- Tool calls in content array (not separate field)
- Different ID format
- More flexible content blocks
Adoption: - ✅ Anthropic Claude (all versions) - ✅ AWS Bedrock (Claude models) - ✅ LangChain, LangGraph support - ⚠️ Requires adaptation for other models
3. Model Context Protocol (MCP) Tool Standard
Status: Emerging standard (Nov 2024)
Adoption: Anthropic, growing ecosystem
Specification: https://modelcontextprotocol.io/
Format:
{
"jsonrpc": "2.0",
"method": "tools/list",
"id": 1
}
Response:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"tools": [
{
"name": "get_weather",
"description": "Get current weather",
"inputSchema": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
]
}
}
Tool Invocation:
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "get_weather",
"arguments": {
"location": "Paris"
}
},
"id": 2
}
Key Features: - JSON-RPC 2.0 protocol - Standardized discovery mechanism - Server-client architecture - Transport agnostic (STDIO, HTTP/SSE)
Adoption: - ✅ Anthropic Claude Desktop - ✅ Growing MCP server ecosystem - 🔄 Early adoption phase - 🔄 Framework integration in progress
Comparison Matrix
| Feature | OpenAI | Anthropic | MCP |
|---|---|---|---|
| Format | JSON | JSON | JSON-RPC 2.0 |
| Schema | JSON Schema | JSON Schema | JSON Schema |
| Discovery | Static | Static | Dynamic |
| Transport | HTTP | HTTP | STDIO/HTTP/SSE |
| Multi-call | Yes | Yes | Yes |
| Streaming | Yes | Yes | Yes |
| Adoption | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ (new) |
Agent Communication Protocols
1. ReAct (Reasoning + Acting) Pattern
Status: De facto standard for agent reasoning
Paper: https://arxiv.org/abs/2210.03629
Format:
Thought: I need to find the weather in Paris
Action: get_weather(location="Paris")
Observation: Temperature is 22°C, sunny
Thought: I have the information needed
Answer: The weather in Paris is 22°C and sunny.
Structured Format:
{
"thought": "I need to find the weather in Paris",
"action": {
"tool": "get_weather",
"parameters": {"location": "Paris"}
},
"observation": "Temperature is 22°C, sunny",
"answer": "The weather in Paris is 22°C and sunny."
}
Adoption: - ✅ LangChain agents - ✅ LangGraph - ✅ AutoGen - ✅ Most agent frameworks
2. Agent Protocol (by AI Engineer Foundation)
Status: Emerging standard
Website: https://agentprotocol.ai/
GitHub: https://github.com/AI-Engineer-Foundation/agent-protocol
Purpose: Standardize agent-to-agent and human-to-agent communication
API Endpoints:
POST /agent/tasks # Create task
GET /agent/tasks/{id} # Get task status
POST /agent/tasks/{id}/steps # Execute step
GET /agent/tasks/{id}/steps # List steps
Task Format:
{
"input": "Analyze sales data and create report",
"additional_input": {
"data_source": "s3://bucket/data.csv"
}
}
Response:
{
"task_id": "task_123",
"status": "running",
"steps": [
{
"step_id": "step_1",
"name": "Read data",
"status": "completed"
},
{
"step_id": "step_2",
"name": "Analyze",
"status": "running"
}
]
}
Adoption: - 🔄 Early adoption - 🔄 Framework integration in progress - ✅ AutoGPT support
3. Multi-Agent Communication Standards
Patterns:
Broadcast Pattern
{
"from": "agent_coordinator",
"to": ["agent_1", "agent_2", "agent_3"],
"type": "broadcast",
"message": "Start processing task X"
}
Request-Response Pattern
{
"from": "agent_1",
"to": "agent_2",
"type": "request",
"request_id": "req_123",
"action": "analyze_data",
"data": {...}
}
Publish-Subscribe Pattern
{
"topic": "task_completed",
"publisher": "agent_1",
"data": {
"task_id": "task_123",
"result": {...}
}
}
Prompt Format Standards
1. ChatML (Chat Markup Language)
Status: OpenAI standard
Format:
<|im_start|>system
You are a helpful assistant.
<|im_end|>
<|im_start|>user
What's the weather?
<|im_end|>
<|im_start|>assistant
JSON Representation:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather?"},
{"role": "assistant", "content": ""}
]
}
Adoption: - ✅ OpenAI models - ✅ Many open-source models - ✅ Standard in fine-tuning
2. Anthropic Message Format
Format:
{
"system": "You are a helpful assistant.",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's the weather?"}
]
}
]
}
Multi-modal:
{
"messages": [
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": "..."
}
},
{"type": "text", "text": "What's in this image?"}
]
}
]
}
3. Llama Format
Format:
<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>
What's the weather? [/INST]
Adoption: - ✅ Meta Llama models - ✅ Many Llama-based models
API Standards
1. OpenAI API Standard
Status: Industry standard
Base URL: https://api.openai.com/v1
Endpoints:
POST /chat/completions # Chat completion
POST /completions # Text completion
POST /embeddings # Generate embeddings
POST /images/generations # Image generation
Request Format:
{
"model": "gpt-4",
"messages": [...],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}
Adoption: - ✅ OpenAI - ✅ Azure OpenAI - ✅ Many compatible APIs (Anthropic, Cohere, etc.) - ✅ LiteLLM (unified interface)
2. OpenAPI/Swagger for Tool Definitions
Status: Standard for API documentation
Specification: https://swagger.io/specification/
Example:
openapi: 3.0.0
info:
title: Weather API
version: 1.0.0
paths:
/weather:
get:
summary: Get current weather
parameters:
- name: location
in: query
required: true
schema:
type: string
responses:
'200':
description: Weather data
content:
application/json:
schema:
type: object
properties:
temperature:
type: number
condition:
type: string
Usage in Agents: - ✅ AWS Bedrock Agents (action groups) - ✅ LangChain tools - ✅ API Gateway integration
Evaluation Standards
1. HELM (Holistic Evaluation of Language Models)
Organization: Stanford CRFM
Website: https://crfm.stanford.edu/helm/
Metrics: - Accuracy - Calibration - Robustness - Fairness - Bias - Toxicity - Efficiency
Scenarios: - Question answering - Information retrieval - Summarization - Sentiment analysis - Toxicity detection
2. MMLU (Massive Multitask Language Understanding)
Paper: https://arxiv.org/abs/2009.03300
Coverage: - 57 subjects - STEM, humanities, social sciences - Elementary to professional level
Standard Benchmark: - Used by all major LLM providers - Reported in model cards
3. HumanEval (Code Generation)
Paper: https://arxiv.org/abs/2107.03374
Dataset: https://github.com/openai/human-eval
Format: - 164 programming problems - Function signature + docstring - Unit tests for verification
Adoption: - ✅ Standard for code models - ✅ Used by OpenAI, Anthropic, Google
4. TruthfulQA
Paper: https://arxiv.org/abs/2109.07958
Purpose: Measure truthfulness and reduce hallucinations
Categories: - Health - Law - Finance - Politics
Interoperability Standards
1. ONNX (Open Neural Network Exchange)
Organization: Linux Foundation
Website: https://onnx.ai/
Purpose: Model format interoperability
Support: - PyTorch → ONNX - TensorFlow → ONNX - ONNX → Various runtimes
2. Hugging Face Model Hub Standard
Website: https://huggingface.co/
Standard Components: - Model card (README.md) - Config.json - Tokenizer files - Model weights
Model Card Format:
---
language: en
license: apache-2.0
tags:
- text-generation
- llm
datasets:
- common_crawl
metrics:
- perplexity
---
# Model Description
...
3. LangChain Standard Components
Abstractions:
# Standard interfaces
class BaseLanguageModel:
def invoke(self, input: str) -> str: ...
def stream(self, input: str) -> Iterator[str]: ...
class BaseTool:
name: str
description: str
def run(self, input: str) -> str: ...
class BaseRetriever:
def get_relevant_documents(self, query: str) -> List[Document]: ...
Adoption: - ✅ LangChain ecosystem - ✅ LangGraph - ✅ Many frameworks adopt similar patterns
Emerging Standards
1. OpenAI Assistants API
Status: Emerging
Documentation: https://platform.openai.com/docs/assistants/overview
Features: - Persistent threads - Built-in tools (code interpreter, retrieval) - File handling
Format:
{
"assistant_id": "asst_abc123",
"thread_id": "thread_abc123",
"message": "Analyze this data"
}
2. Semantic Kernel Standard Plugins
Organization: Microsoft
Website: https://learn.microsoft.com/en-us/semantic-kernel/
Plugin Format:
[KernelFunction]
[Description("Get weather for a location")]
public async Task<string> GetWeather(
[Description("City name")] string location
)
{
// Implementation
}
3. LangGraph State Schema
Format:
from typing import TypedDict
class AgentState(TypedDict):
messages: list
next_action: str
data: dict
Standard State Management: - Typed state definitions - State persistence - State versioning
Best Practices
1. Choose Standards Based on Ecosystem
AWS Ecosystem: - ✅ Use Anthropic format for Claude - ✅ Use OpenAPI for Bedrock Agents - ✅ Consider MCP for tool integration
OpenAI Ecosystem: - ✅ Use OpenAI function calling - ✅ Use ChatML format - ✅ Follow OpenAI API conventions
Multi-Provider: - ✅ Use LiteLLM for unified interface - ✅ Abstract tool definitions - ✅ Use MCP for portability
2. Version Your Schemas
{
"schema_version": "1.0",
"tool": {
"name": "get_weather",
"version": "2.0",
"parameters": {...}
}
}
3. Document Deviations
When you deviate from standards, document why:
# Note: Using custom format instead of OpenAI standard
# Reason: Need additional metadata not supported by standard
# Migration path: Will adopt standard when feature is added
4. Use Adapters for Compatibility
class ToolAdapter:
"""Adapt between OpenAI and Anthropic formats"""
@staticmethod
def openai_to_anthropic(openai_tool):
return {
"name": openai_tool["function"]["name"],
"description": openai_tool["function"]["description"],
"input_schema": openai_tool["function"]["parameters"]
}
5. Test Against Multiple Standards
def test_tool_compatibility():
tool = MyTool()
# Test OpenAI format
assert validate_openai_format(tool.to_openai())
# Test Anthropic format
assert validate_anthropic_format(tool.to_anthropic())
# Test MCP format
assert validate_mcp_format(tool.to_mcp())
Standard Adoption Timeline
2020: OpenAI API becomes de facto standard
2021: Hugging Face model hub standardization
2022: ReAct pattern published
2023: OpenAI function calling standard
2023: Anthropic tool use format
2024: Model Context Protocol (MCP) announced
2024: Agent Protocol specification
2025: Convergence toward unified standards (ongoing)
Future Directions
Likely Developments:
Unified Tool Calling Standard
- Convergence of OpenAI and Anthropic formats
- MCP adoption grows
Agent Communication Protocol
- Standardized multi-agent communication
- Cross-framework agent collaboration
Evaluation Standards
- More comprehensive benchmarks
- Domain-specific evaluation suites
Safety Standards
- Standardized guardrails
- Content filtering protocols
- Bias measurement standards
Observability Standards
- Standardized tracing formats
- Common metrics definitions
- Debugging protocols
Resources
Standards Organizations
- OpenAI: https://platform.openai.com/docs/
- Anthropic: https://docs.anthropic.com/
- MCP: https://modelcontextprotocol.io/
- OpenAPI Initiative: https://www.openapis.org/
- Linux Foundation AI: https://lfaidata.foundation/
Specifications
- OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling
- Anthropic Tool Use: https://docs.anthropic.com/claude/docs/tool-use
- MCP Specification: https://spec.modelcontextprotocol.io/
- OpenAPI 3.0: https://swagger.io/specification/
- JSON-RPC 2.0: https://www.jsonrpc.org/specification
Benchmarks
- HELM: https://crfm.stanford.edu/helm/
- Open LLM Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard
- Chatbot Arena: https://chat.lmsys.org/
Last Updated: January 2026
Note: Standards in the AI/LLM space are rapidly evolving. This document reflects the current state but will require regular updates as the ecosystem matures.